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On Approximation of Distribution and Density Fane t ions 



Hans Wolff 
Abstract 

Stochastic approximation algorithms for least square error approxim^.- 
tion to density and distribution functions are considered. The main results 
are necessetry and sufficient parameter conditions for the convergence of the 
approximation processes and a generalization to some time -dependent density 
and distribution functions. 
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On Approximation of Distribution and Density Functions 



Hans Wolff 

In this paper ve deal vith the special approach to the estimation of 
an unknown density or distribution function of a real -valued random 
variable | as developed in [l]-[6]. Using the same notation we briefly 
describe this approach. 

T 

Consider the N -dimensional vector of functions ^(x) (<>^(x), , • . ,^^(x) ) 

The components f i - 1,,,,W , are assumed to be linearly indeperi'lent, 

square -inte grab le and bounded real functions on an interval Q = [a^ b] of 
the real slxIs, If a sequence of independent observations (xj^,Xg, from 
i is available, the problem is then to find an approximation 

f(x) = 2 aA, (x) = a £(x) 

i=] ^ ^ 

in a for the unknoi-m distribution fiuiction F(x) , such that F(x) mini- 
mizes the integral-square-error criterion 

( 1 ) G^iq) ^ f iF(x) - a\(x)f dx 

a 

T 

with respect to the vector of coefficients a = . The analo- 

gous estimation problem for the unknown density function f(x) consists in 
determining the estimator f(x) , 

f(x) = T, p,<^,(x) = g $(x) , 

i=l 

such that again the integral-square -error criterion 

( 2 ) ^ 2 ^^^ ^ J 

n 

O 

is a minimum with respect to g . 
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As can be easily shown (see e.g., [l]), minimizing (l) and (2) is 



equivalent tc solving the regression equations 



(5) 



e[ / z(|, y)$(y) dy - to] ^ o 



n 



and 

(4) e[wCO - - 0 , 

respect ively_, where A is a known N x N -matrix^ 

C T 

A =. / <I’(y)i> (y) dy , 
and z(i, y) and w(0 defined as 



'M, y) - { 



1 . f i < y 
0 ^ ^ > y ’ 



The purpose of the mentioned papers consisted in solving the parameter - 
dependent regression equations (3) and (4) by the application of the stochastic 
approximation theory as an appropriate method , A further goal was to give 
an iterative solution in order to avcad computer storage problems. But 
because of the linear independence of the ^^(x) , i = 1,...,N , A ^ exists 
and we can solve (3) and (4) directly; 

(5) a* = J z(|, y)$(y) dy] , 



( 6 ) 



£* = A T'^Lw 



H^[w(Oj 



Therefore we have only to estimate the expectations of the parameter -independent 
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random variables j y)^(y) ^y simplifying 

ri 

the statement of the problem we can expect stronger limiting theorems for 

those procedures considered in [l]-[8]. In previous papers ([9]^ [lO]) the 

author has dealt with such iterative approximations of the expectation of 

a random variable . The following process was considered. 

Let (a } be any sequence of real numbers restricted to 0 < a <1 
n n 

T 

for all n and let . . . ^y^j) denote the n -th observation 

T 

of a real-valued N -dimensional random variable ^ 

Ihen the approximation procedure (X^) is defined by the iteration formula 

(7) X (1 - a , ) X + a Y , n = 0, 1, . . . > 

'' ' -n+1 ^ nfl^ -n n-t-1 -nfl ’ y 7 7 

N 

with an ai^itrary but fixed starting point ^ a £ R . Theorem 1 gives 
necessary and sufficient parameter conditions for the convergence of this 
process . 

Theorem 1 ; The process (7) converges under the assimption 

0 < max Var q . < « 
l<i<N ^ 

with probability one and in the mean to the expectation M of ^ 

-» M w.p.l , E(X^ - M)^ • > 0 (n -> oo) , 

if and only if 

n 

(8) a -> 0 ; £ a . » (n co) 

i=l ^ 
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The parameter condition (8) is only sufficient if we admi-i- the degenerated 
and trivial case Var t|^ = 0 j i = Ij.. ,jN . The proof of Theorem 1 is 
given in [lO]. 

The application of Theorem 1 to the random variables A and A 

yields at once those estimation procedures sought 

vectors and considered in [l]-[8]: 



(9) Of ,-(l-a ,)a + a.TA'^z-, ^ , a.^^beR^w.p.l , 

--m-1 n+l^n n+1- -l,n+l ^ -0 “ ^ ^ 



-1 



£n+l “ '*■ \+l- - 2 , Ml ’ &o “ - ^ ^ w.p.l j 

where z and denote the n -th observation of the random variables 

-Ijn -2jn 



and j respectively; 



i(y) <iy 



i(y) 



if 



X < a 
n 



a < X < b . 
- n - 



X > b 

n 



-n2^n 



- < 



^{x ) 

-V Yl' 



if 



X € n 

n 



X 

n ^ 



From Theorein 1 follows immediate lyj 

Theorem 2 : The stochastic process defined by (9) and (lO) converges with 

probability one and in the mean to and p* , respcctivelyj if and only 

if the sequence of parameters (a^) fulfills condition (8). 
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We mention that the following modifications of ( 9 ) and (10 ) suggested^ 
for example in [l], [6], [ 7 ]? 

- 1 1 ^ ^ 

a , (1 - a T ) a + a ,A • r z z, . , 

-*n+l ' n+1^ -n u+l~ n + 1 ^ ^ -l,i ’ 



n+1 



6 T = (l-a T)f3 + a.,A • ^ Z Zo ^ 

t^n+1 n+1^ -^n n+l- n + 1 . , -.^.i 

1---I ^ 



do not have a faster rate of convergence than (9) and (lO) themselves as was 

erroneously asserted in [6] and [ 7 ]* The error consisted essentially in 
1 ^ 1 

taking a and Z . (or (3 and ^ Z , respectively) 

^-n n+ 1 . --1,1' ^n n+ 1 ., -2.1' ^ ^ ' 

1=1 ^ 1=1 ^ 

as independent random variables (e.g. [6], p. 15 equation (7))* 



Time -dependent Density and Distribution itinctions 



Instead of identically distributed values , i = 1^2^,.. from ^ ve 
deal now with a sample {x^^x^,...} corresponding to a sequence of random 
variables { - vhere is distributed with F^(x) , i = 1^2,... , 

representing j e.g. successive time periods. Since we vmnt to derive an analo- 
gous limiting theorem to that given in Theorem 2 we restrict our selves to the 
case where (F^(x)} converges to a limiting distribution F(x) and [f^{x)) 
converges to a limiting density function f(x) . For thic situation we have 
the follcwirig corol lary to Theorem 2. 

Corollary ; Tlieorsm 2 holds even in the case where the observations x^ , 
i = 1,2, .. . ^ are drawn from a population with a distribution function 
Fj_(x) and a density function assume 
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F^(x) >F(x) , f^(x) -^f(x) (i 

Lf(x) distribution furictionj f(x) density function]. 

This corollary follows iinmediately from (5) and (6) and from a generalized 
version of Theorem 1 given below. 

T 

Let {y. = (y. ,jy. oj***jy* 3 be a sequence of independent N 
-dimensional real-valued observations distributed with [F^(y^, . . . jy^)) , 
respectively^ and where F^(y^j . . . .y^^) converges to a nondegenerated limiting 
distribution F(y^j...jy^) . Tli n ve have 
Theorem 3 : The process ( 7 ) 

Vi = - Vi^ ^ ' Xo = ^ ’ 

converges under the a.s sumption 

max Var y. • 5 ^ ^ j ^ ‘ • • 

with probability one and in the mean to the expectation M of F(y^,.. -jy^^) ? 

-> M w.p.l J E(X^ . M)^ -» 0 (n -» =«) 

if and only if {a^J fulfills condition (8). 

Because of the length of the proof of this theorem^ the reader is 
referred to [ 9 ] or tlO]. Some problems arise if ve consider the case vhere 
n is the whole probability space j especially the entire real axis. In this 
case it is natural to require that the approximation f(x) should satisfy 
the norma li -nation condition 

^he assumption f(x) j vhere f(x) is a density function, is 

sufficient for F. (x) -> F(x) ; and F(x) distribution function (see e.g.j 

[ 11 ]). 

o 
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J ±{x) 6x^1 

Q 

Unfortunately this is not true in general. To avoid this we can use Lagran>;e's 
coefficients method as was done for orthonormal functions by 

Laski [ 5 ] and for a similar problem by Nikolic and fu [6]. 

Instead of (2) we now minimize the criterion 

o N N 

f3.^.(x)]^ dx - 2A( L p.d. - 1) , 

where A is a Lagrange coefficient and 

r\ 

di = j ‘>i(x) ctx , 0<|dj<c., i.l,2,...,N . 



The ininimization conditions 




0 , i 



1,...,N 




0 



yield the system of linear equations 



N 

L 

i = l 






1 



H 

* V " ^ k = , 

Where A = means the same N x N -matrix as given in (4). 

From this we obtain the solution 
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( 11 ) 



PI* = 

0 



1 N 



ij 



N 



N 



|a| - £ E$ (x) S d A 

^ , k=l ^ t=l ^ ^ 

E-^.(x) + d. — 



N 

2 d. 



N 

T. a 



k=l ^ 



n\<l 



where A. . is che adjunct of a. . . 
ij ij 



With the abbreviations 



D. . 
10 



N N 

S d A. S d, A . 
1=1 ^ k=l ^ ^ 
N N 

2 d S d A^ 

k=l <1=1 '■ '■ 



T. d.A., 



i=l 



1 ij 



N N 

S d, S d„A, , 
k=l ‘^1=1 



C . . = -j — r (a . . - D . . ) , 

ij |a 1 ij ij 



ve can rewrite (ll); 



= D, + 2 c.^E^t*. (x) 

0 0 ij ^ 



From Theorem 1 it follows at once that the stochastic processes defined by 

N 



( 12 ) 



n+1 



(l - a , T )Y + a ,[d, + £ c,.<t’.(x )] j 

^ m-1' n n+1 j io n' ^ 



Yq = bj € R' , j = 1,...,N 



converge to p** , j = l,...jN, vrith probability one and in the mean if and 
only if the p»arameter condition (8) is fulfilled. To avoid unnecessary compu- 
tations we estimate the parameters B. = - D. . The final form of the 

0 0 0 
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sequential estimation of the unknown vector of parameters 
b'^ = is then 

(13) Y , = (1 - a , )y + a C >l(x ) , Y. = b e , 

^ ' -n+1 ' n+l'n n ^ n' ’ 0 - ^ 



where C is the N x N “matrix C - (c. .) . 

T 

Theorem 4 ; The process (15) converges to the vector B with probability 
one and in the quadratic mean iff the parameter sequence {a^j satisfies 
condition (8). 

We give a simple application. Co.isider a mixture 



N N 

p(x) = E , E 3 = 1 , 

ir=l imrl 

of density functions , i = 1,.,,,N . The set of functions 

is assumed to be known and to be linearly independent on H . Furthermore a 

sequence of independent observations {x^, . . . ^x^]--identically distribt’'^ 

with p(x)--may be available from which we want to estimate the parameters 

, i = 1,...,N . This decomposition of a mixture can be done by o'U’ 

sequential estimation procedure (l2) or (l5)* Because d^ equals 1 , 

i = 1,...,N , we get simpler formulas for the D . . and D. : 

^ j J 



D 



ij 



N 

Z 

A=1 



u 



N 

Z 

k-1 



hi 



N 

Z 

k=l 



N 




D . 

J 



N 

Z 

i=:l 



IJ 



N N 
Z Z 
k=l J^-1 



hi 



i,j = 1, . . . ,N 



The stochastic processes (12) and (l5) converge to the unkncr^^n parameters 

\ 

j = 1,...,N , and B. = p, “ ; respectively, 

j 0 J 
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